30 October 2018

Exploring building characteristics

Motivation

We want to understand which building characteristics influence energy demand and quantify the expected energy consumption.

Work with smart meter and characteristic data for 99 commercial buildings across Australia.

3 years 15-minute meter data available.

Smart meter data

Histogram of normalised energy for all commercial buildings.

Histogram of normalised energy for all commercial buildings.

Building characteristics

Worked with engineers to identify which characteristics are suspected to be most important factors in energy consumption.

Attribute BID0045 BID0061 BID0123 BID0210 BID0717 BID0720
CentralDist FALSE TRUE TRUE TRUE FALSE FALSE
DXSystem FALSE FALSE FALSE TRUE FALSE TRUE
ElectricElementHeating FALSE FALSE TRUE FALSE TRUE TRUE
GasFiredBoiler TRUE TRUE FALSE TRUE FALSE FALSE
TenantFeed FALSE FALSE TRUE FALSE FALSE FALSE
WaterCooledCondenser TRUE TRUE TRUE TRUE TRUE FALSE

Outlier detection

Hourly boxplots of normalised energy demand for building BID0107.

Hourly boxplots of normalised energy demand for building BID0107.

Linear mixed model

Working with strongly correlated data within buildings. Use linear mixed model with buildings treated as random effects and characteristics as fixed effects. A random slope based off year used to capture any trends in building consumption.

For a particular season and hour of the day, the demand \(y_{ij}\) for building \(j\) and observation \(i \in \left\{ 1, 2, \ldots n_j\right\}\) is

\[ \log y_{ij} = \beta_0 + \sum_{h=1}^p \beta_h x_{hij} + u_{0j} + u_{1j} t_{ij} + \epsilon_{ij}, \\ \begin{bmatrix} u_{0j} \\ u_{1j} \end{bmatrix} \sim N(0, \Omega_u), \quad \Omega_u = \begin{bmatrix} \sigma^2_{u0} & \sigma_{u01} \\ \sigma_{u01} & \sigma^2_{u1} \end{bmatrix}, \\ \epsilon_{ij} \sim N(0, \sigma^2_\epsilon), \]

where \(\beta_h\) is the coefficient for predictor \(x_{hij}\), \(u_{0j}\) is the random intercept for building \(j\), \(u_{1j}\) is the random slope coefficient for year \(t_{ij}\) and \(\epsilon_{ij}\) are the residuals.

Estimating energy profiles

Consider attribute \(h\) for building \(j\) changing from false to true. Then we can use the approximation

\[ Y_{ij} |_{X_{hij} = 1} \approx (Y_{ij} |_{X_{hij} = 0}) e^{\beta_h} \]

The values \(e^{\beta_h}\) can be interpreted as the percentage change when \(X_{hij}\) changes value from false to true.

Model selection

For our model selection we use the marginal AIC (mAIC) which is the most widely used information criteria for mixed effects models. Vaida and Blanchard (2005) define the mAIC as

\[ mAIC = -2\ell\left(\hat{\mathbf{\beta}}\right) + 2 (p+q), \]

where \(\ell\left(\hat{\mathbf{\beta}}\right)\) is our log-likelihood function, \(p\) is the number of fixed effects, \(q\) is the number of random effects. We choose this criterion as it is both simple to understand and has been used in many studies (Müller, Scealy, and Welsh 2013).

Multimodel inference

Unfortunately there was no clear best model using \(mAIC\). Our problem can be approached using multimodel inference.

Candidate sets

To construct our candidate set we must first determine the probability of each candidate model. To do so we use Akaike weights. Given \(R\) candidate models the Akaike weight for model \(g_i\) is:

\[ w_i = \frac{\mathcal{L} \left(g_i | \mathbf{x} \right)}{\sum_{r=1}^R \mathcal{L} \left(g_r | \mathbf{x} \right)} = \frac{e^{-\frac{1}{2}\Delta_i}}{\sum_{r=1}^R e^{-\frac{1}{2}\Delta_r}}, \]

where \(\mathcal{L} \left(g_i | \mathbf{x} \right)\) is the likelihood of model \(g_i\) given data \(\mathbf{x}\) and \(\Delta_i = mAIC_i - mAIC_{min}\) and is referred to as the AIC difference. We use \(mAIC\) in our analysis as we are dealing with mixed effects models. Other information criteria for mixed effects models such as the conditional AIC (Greven and Kneib 2010; Vaida and Blanchard 2005) may also be used.

Parameter estimates

Parameters are estimated by "averaging" the models in our confidence set. There are two common approaches:

  • Natural-model averaging averages over all candidate models where a parameter of interest occurs.
  • Full-model averaging considers all candidate models. If a variable is not selected in one of the candidate models, full-model averaging sets its estimate to zero.

Simulation studies have found that full-model averaging can help to reduce problems caused by model selection bias towards over-complex models (Lukacs, Burnham, and Anderson 2010).

Given a candidate set of \(R\) models our coefficients \(\beta_h\) are estimated by full model averaging:

\[ \hat{\bar{\beta_h}} = \sum^R_{i=1} w_i \hat{\beta}_{hi}, \]

where \(\hat{\beta}_{hi}\) is the estimate of \(\beta_h\) based on model \(g_i\). If \(\beta_h\) is not chosen in model \(g_i\) then \(\hat{\beta}_{hi}\) is defined to equal zero in the above formula.

Unconditional confidence intervals

Once a set of candidate models has been identified we can construct confidence intervals that reflect both parameter and model selection uncertainty. The \(\left( 1-\alpha \right)100\%\) unconditional confidence intervals for the a model averaged coefficient \(\hat{ \bar{ \beta}}_h\) is given by

\[ \hat{ \bar{ \beta}}_h \pm z_{1- \alpha/2} \widehat{ \text{ase}}\left( \hat{ \bar{\beta}}_h \right), \]

where \(\widehat{ \text{ase}}\left( \hat{ \bar{\beta}}_h \right)\) is the adjusted standard error from Burnham and White (2002). It is given by

\[ \widehat{ \text{ase}}\left( \hat{ \bar{\beta}}_h \right) = \sum^R_{i=1} w_i \sqrt{ \left( \frac{t_{\text{df}_i, 1-\alpha/2}}{z_{1-\alpha/2}} \right)^2 \widehat{\text{var}} \left( \hat{\beta}_{hi} | g_i \right) + \left( \hat{\beta}_{hi} - \hat{\bar{\beta}}_h \right)^2 }, \]

where \(\bar{\beta}_h\) is the model averaged estimator of \(\beta_h\), \(\widehat{\text{var}} \left( \hat{\beta}_{hi} | g_i \right)\) is the estimated variance of parameter \(\beta_{hi}\) in model \(g_i\), and \(w_i\) are weights. The calculation of \(\widehat{\text{var}} \left( \hat{\beta}_{hi} | g_i \right)\) for mixed effects models is reasonably complex and is omitted (see Bates et al. (2015) for a discussion).

Results

The solid line shows the estimated tenant feed coefficients for each hourly model. 80\% and 90\% confidence intervals are indicated by the shaded ribbons.

The solid line shows the estimated tenant feed coefficients for each hourly model. 80% and 90% confidence intervals are indicated by the shaded ribbons.

Profile plots of electric element heating impact. As expected the heating demand mainly plays a role in winter.

Profile plots of electric element heating impact. As expected the heating demand mainly plays a role in winter.

Further research

Including more buildings will allow for more confident conclusions to be reached and more characteristics to be investigated.

  • Apply to the residential sector to estimate the impact of demand influences such as solar generation, batteries or other household items.
  • Assess the impact of policies on energy consumption.

Questions?

References

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” Journal of Statistical Software, Articles 67 (1): 1–48. https://www.jstatsoft.org/v067/i01.

Burnham, Kenneth P, and Gary C White. 2002. “Evaluation of Some Random Effects Methodology Applicable to Bird Ringing Data.” Journal of Applied Statistics 29 (1-4). Taylor & Francis: 245–64. https://doi.org/10.1080/02664760120108755.

Greven, Sonja, and Thomas Kneib. 2010. “On the Behaviour of Marginal and Conditional AIC in Linear Mixed Models.” Biometrika 97 (4). Oxford University Press: 773–89. https://academic.oup.com/biomet/article-abstract/97/4/773/241321.

Lukacs, Paul M, Kenneth P Burnham, and David R Anderson. 2010. “Model Selection Bias and Freedman’s Paradox.” Annals of the Institute of Statistical Mathematics 62 (1). Springer: 117. https://link.springer.com/article/10.1007/s10463-009-0234-4.

Müller, Samuel, J L Scealy, and A H Welsh. 2013. “Model Selection in Linear Mixed Models,” June. http://arxiv.org/abs/1306.2427.

Vaida, Florin, and Suzette Blanchard. 2005. “Conditional Akaike Information for Mixed-Effects Models.” Biometrika 92 (2). Oxford University Press: 351–70. https://academic.oup.com/biomet/article-abstract/92/2/351/233128.